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One goal of statistical privacy research is to construct a data release mechanism that 
protects individual privacy while preserving information content. An example is a 
random mechanism that takes an input database X and outputs a random database Z 
according to a distribution Qn{-\X). Dijferential privacy is a particular privacy re- 
quirement developed by computer scientists in which is required to be insen- 
sitive to changes in one data point in X. This makes it difficult to infer from Z whether 
a given individual is in the original database X. We consider differential privacy from 
a statistical perspective. We consider several data release mechanisms that satisfy the 
differential privacy requirement. We show that it is useful to compare these schemes 
by computing the rate of convergence of distributions and densities constructed from 
the released data. We study a general privacy me thod, called the exponential mecha- 



nism, introduced by iMcSherry and Talwaii (|2007h . We show that the accuracy of this 



method is intimately linked to the rate at which the probability that the empirical dis- 
tribution concentrates in a small ball around the true distribution. 



1 Introduction 



One goal of data privacy research is to derive a mechanism that takes an input database X and 

releases a transformed database Z such that individual privacy is protected yet information content 

is preserved. This is known as disclosure limitation. In this paper we will consider various methods 
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for producing a transformed database Z and we will study the accuracy of inferences from Z under 
various loss functions. 

There are numerous approaches to this problem. The literature is vast and includes papers from 
computer science, statistics and other fields. The terminology also varies considerably. We will 
use the terms "disclosure limitation" and "privacy gua rantee" interchangeably. 



Disclosure limitation methods inc 



2004 ), matrix masking (ITing et al. 



diver sity (IMachanavaiihala et al. 



2006 ). data perturbation (l Evfimievski et al 



ude clusterin g dSweenev 



2006). t-closeness (ILi et al 



2002, 



Aggarwal et al 



2006)^ 



20071) . data s\yapping (IFienberg and Mclntvre . 



20081). cryptographic approaches (IPinkas 



2004, 



Kim and Winklei . 



1998|) and distributed database methods (IFienberg et al. 



ences on disclosure risk and limitation include 



(Il99lh . 



Reited ( 2005 ). We refer to 



2007 



Sanil et al 



Duncan and Lambert ( 



Reitej (120051) and 



Sanil et al 



1986 



Pinka 


S.2002 


Feigenbaum et al. 




2003/ 


Warner 


1965 


Fienberg et al. 



1989h . 



2004 ). Statistical refer- 



Duncan and Pearson 



(j2004|) for further references. 



One approach to defining a privacy guarantee t hat has received much attention in t he computer 



science literature is known as differential privacy (iDwork et al. 



large body of work on this t opic including, for example. 



(2004), 



ilumetaL 



(l2007h . 



Blum et al. 



(2005) 



mm . 



Dwork et al 



(2007) 



2006 



Dwork, 



Dinur and Nissim (2003) 



Nissim et 



Kasiviswanathan et al. 



al. 



(120081) 



(2007) 



Barak et al. 



Blum et al 



2006). There is a 



Dwork 



(l2007h . 



and Nissim 



McSherry and Talwai 



(|2008b gives a machine 



learning approach to infere nce und e r diffe rential privacy constraints and to some extent our results 
are inspired by that paper. ISmithI (|2008|) shows how to provide efficient point estimators while 
preserving differential privacy. He constructs estimators for pa rametric models with rnean sq uared 



error (1 + o(l))/ {nl(9)) where 1(9) is the Fisher information. 



Machanavajjhala et al. 



( 20081) con- 



sider privac y for histograms by sampling from the posterior distribution of the cell probabilities. 



We discuss 



Machanavajjhala et al. 



(|2008h further in Section IH After submitting the first draft of 



this paper, new wo r k has appeared on differential privacy that i s also statistical i n natu re, namely. 



Ghosh et al. 



mm. 



Dwork and Leil (120091) . 



Dwork et al 



(l2009l) . 



Feldman et al. 



(2009). 



The goals of this paper are to explain differential privacy in statistical language, to show how to 
compare different privacy mechanisms by computing the rate of convergence of distributions and 
densities based on the released data Z, and to study a general privacy method, called the exponen- 
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tial mechanism, due to lMcSherry and Taiwan (l2007h . We show that the accuracy of this method is 



intimately linked to the rate at which the probability that the empirical distribution concentrates in 
a small ball around the true distribution. These so called "small ball probabilities" are well-studied 
in probability theory. To the best of our knowledge, this is the first time a connection has been 
made between differential privacy and small ball probabilities. We need to make two disclaimers. 
First, the goal of our paper is to investigate differential privacy. We will not attempt to review all 
approaches to privacy or to compare differential privacy with other approaches. Such an under- 
taking is beyond the scope of this paper. Second, we focus only on statistical properties here. We 
shall not concern ourselves in this paper with computational efficiency. 

Li Section [2] we define differential privacy and provide motivation for the definition. In Section 
[3] we discuss conditions that ensure that a privacy mechanism preserves information. In Section |4] 
we consider two histogram based methods. In Section [5] and [6l we examine another method known 
as the exponential mechanism. Section |7] contains a small simulation study and Section [8] contains 
concluding remarks. All technical proofs appear in Section|9l 

1.1 Summary of Results 

We consider several different data release mechanisms that satisfy differential privacy. We evaluate 
the utility of these mechanisms by evaluating the rate at which d{P, Pz) goes to 0, where P is 
the distribution of the data X E X, Pz is the empirical distribution of the released data Z, and 
d is some distance between distributions. This gives an informative way to compare data release 
mechanisms. In more detail, we consider the Kolmogorov-Smirnov (KS) distance: 
Fz{x)\, where F, Fz denote the cumulative distribution function (cdf) corresponding to P and the 
empirical distribution function corresponding to Pz, respectively. We also consider the squared L2 
distance: / {p{x) — pzY, where pz is a density estimator based on Z. Our results are summarized 
in the following tables, where n denotes the sample size. 

The first table concerns the case where the data are in W and the density p of P is Lipschitz. 
Also reported are the minimax rates of convergence for density estimators in KS and in squared L2 
distances. We see that the accuracy depends both on the data releasing mechanism and the distance 
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function d. The results are from Sections 4 and 5 of the paper. (The exponential mechanism under 
L2 distance is marked NA but is in the second table in case r — 1. We note that the rate for KS 

distance for perturbed histogram is ^^logn/n for r = 1.) 



Distance 


Data Release mechanism 


minimax 
rate 


smoothed 
histogram 


perturbed 
histogram 


exponential 
mechanism 


L2 

Kolmogorov-Smimov 


^-2/(2r+3) 

Vlognx 71-2/(6+0 


^-2/(2+r) 

logn X 71-2/(2+'") 


NA 

71-V3 


^-2/(2+0 
71-V2 



The next table summarizes the results for the case where the dimension of X is r = 1 and the 
density p is assumed to be in a Sobolev space of order 7. We only consider the squared L2 distance 
between the true density p and the estimated density pz in this case. The results are from Section 
6 of the paper. 





exponential 
mechanism 


perturbed orthogonal 
series estimator 


minimax rate 


L2 


^-7/(27+1) 


^-27/(27+1) 


^-27/(27+1) 



Our results show that, in general, privacy schemes seem not to yield minimax rates. Two 
exceptions are perturbation methods evaluated under L2 loss which do yield minimax rates. An 
open question is whether the slower than minimax rates are intrinsic to the privacy methods. It is 
possible, for example, that our rates are not tight. This question could be answered by establishing 
lower bounds on these rates. We consider this an important topic for future research. 

2 Differential Privacy 

Let Xi, . . . , Xn he a random sample (independent and identically distributed) of size n from a 
distribution P where Xi e X. To be concrete, we shall assume that X = [0, 1]'' = [0, 1] x 
[0, 1] X • • • X [0, 1] for some integer r > 1. Extensions to more general sample spaces are certainly 
possible but we focus on this sample space to avoid unnecessary technicalities. (In particular, it 
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is difficult to extend differential privacy to unbounded domains.) Let n denote Lebesgue measure 
and let p = dP/dfj, if the density exists. We call X = (Xi, . . . ,X„) a database. Note that 
X E X" = [0, l]*^' X ■ ■ ■ X [0, l]*". We focus on mechanisms that take a database X as input 
and output a sanitized database Z = (Zi, . . . , Zk) E X'' for public release. In general, Z need 
not be the same size as X. For some schemes, we shall see that large k can lead to low privacy 
and high accuracy while while small k can lead to high privacy and low accuracy. We will let 
k = k{n) change with n. Hence, any asymptotic statements involving n increasing will also allow 
k to change as well. 

A data release mechanism Qn{-\X) is a conditional distribution for Z = (Zi, . . . , Z^) given 
X. Thus, Qn{B\X = x) is the probability that the output database Z is ina set B E B given that 
the input database is x, where B are the measurable subsets of X''. We call Z = (Zi, . . . , Zk) a 
sanitized database. Schematically: 

input database X = (Xi, . . . , X„) ^"^^^^\ output database Z = {Zi, . . . , Z^.). 

sanitize 

Themarginaldistributionof the output database Z induced by P and is M„(_B) = J Qn{B\X = 
x)(iP"(x) where is the n-fold product measure of P. 

Example 2.1. A simple example to help the reader have a concrete example in mind is adding 
noise. In this case, Z = {Zi, . . . , Z„) where Z.i = X^ + ti and ei, . . . , e„ are mean indepen- 
dent observations drawn from some known distribution H with density h. Hence Qn has density 

Definition 2.2. Given two databases X = (Xi, . . . , X„) and Y = (Yi, . . . , ¥„), let6{X, Y) denote 
the Hamming distance between X and Y: S{X, Y) = : Xj 7^ Yj^. 

A general data release mechanism is the exponential mechanism (|McSherry and Talwai , 
which is defined as follows. Let ^ : x X^ :— > [0, 00) be any function. Each such ^ defines a 
different exponential mechanism. Let 



A = A„,fc = sup \i{x,z) - i{y,z)l (1) 

5{x,y)=l 



20071) 
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that is, An^k is the maximum change to ^ caused by altering a single entry in x. Finally, let 
(Zi, . . . , Zfc) be a random vector drawn from the density 



h{z\x) 



exp 



Jp^k 6Xp 



'IK 



ds 



(2) 



where a > 0, z = [zi, . . . , Zk) and ). In this case, has density h{z\x). We'll 

discuss the exponential mechanism in more detail later. 

There are rnany de finiti ons of privacy but in this paper we focus on the following definition due 



to 



Dworketal 



(120061) and 



Dworkl ( 2006 ). 



Definition 2.3. Let a >0. We say that Qn satisfies a -differential privacy if 



sup sup < e 

x.ysA-" BeB Qn{B\X = y) 

5{x,y)=l 



(3) 



where B are the measurable sets on X^. The ratio is interpreted to be 1 whenever the numerator 
and denominator are both 0. 

The definition of differential privacy is based on ratios of probabilities. It is crucial to measure 
closeness by ratios of probabilities since that protects rare cases which have small probability 
under Qn- In particular, if changing one entry in the database X cannot change the probability 
distribution Qn{-\X = x) very much, then we can claim that a single individual cannot guess 
whether he is in the original database or not. Th e closer e" is to 1, t he stronger privacy guarantee 



is. Thus, one typically chooses a close to 0. See 



Dworketal 



( 2006 ) for more discussion on these 



points. Indeed, suppose that two subjects each believe that one of them is in the original database. 
Given Z and full knowledge of P and Qn can they test who is in X? The answer is given in the 
following result. (In this result, we drop the assumption that the user does not know Qn-) 

Theorem 2.4. Suppose that Z is obtained from a data release mechanism that satisfies a- differential 
privacy. Any level 7 test which is a function of Z, P and Qn of Hq : Xi = s versus Hi : Xi = t 
has power bounded above by 7e". 



Thus, if Qn satisfies differential privacy then it is virtually impossible to test the hypothesis 
that either of the two subjects is in the database since the power of such a test is nearly equal to 
its level. A similar calculation shows that if one does a Bayes test between i^o and Hi then the 
Bayes factor is always betw een e~^° and e^". F o r more detail on the motiva t ion fo r the definition 



as we 



(l2009h . 



1 as consequences, see 



DworketaL 



(120061) . 



DworM (120061) . 



Gantaetal 



(l2008h . 



Rastogi et al 



The following result, which is proved in 



McSherry and Taiwan (|2007|) (Theorem 6), shows that 



the exponential mechanism always preserves differential privacy. 



Theorem 2.5. (|McSherry and Talwaii . 120071) The exponential mechanism satisfies the a-differential 
privacy. 

To conclude this section we record a few useful facts. Let T(X, R) be a function of X and some 
auxiliary random variable R which is independent of X. After including this auxiliary random 
variable we define differential privacy as before. Specifically, T{X, R) satisfies differential privacy 
if for all B, and all x, x' with 6{x, x') = 1 we hav e that P(T(X R) e B\X = x) < e"P(T(X, R) e 



DworketaL 



(|2006l) 



B\X = x'). The third part is Proposition 1 from 
Lemma 2.6. We have the following: 

1. IfT{X, R) satisfies differential privacy then U = h{T{X, R) ) also satisfies differential pri- 
vacy for any measurable function h. 

2. Suppose that g is a density function constructed from a random vector T{X, R) that satisfies 
differential privacy. Let Z = {Zi, . . . , Z^) be k iid draws from g. This defines a mechanism 
Qn{B\X) = F{Z G B\X). Then Qn satisfies differential privacy for any k. 



3. (Proposition 1 from 



DworketaL 



( l20Qq) . ) Let f{x) be a function of x = (xi, . . . , x„) and 
define S{f) = sup^,j.,.g(^^.j.,^^i Wfi^) — f{^')\\i where \\a\\-^ = J2j kil- ^ have density 
g{r) oc e""!''!/'^'^-^^. Then T(X, R) = f{X) + R satisfies differential privacy. 
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3 Informative Mechanisms 



A challenge in privacy theory is to find Q„ that satisfies differential privacy and yet yields datasets 
Z that preserve information. Informally, a mechanism is informative if it is possible to make 
precise inferences from the released data Zi, . . . , Z^. Whether or not a mechanism is informative 
will depend on the goals of t he inference. From a statistical perspective, we would like to infer P 



or functionals of P from Z. Blum et al 



(|2008[) show that the probability content of some classes 
of intervals can be estimated accurately while preserving privacy. Their results motivated the 
current paper. We will assume throughout that the user has access to the sanitized data Z but 
not the mechanism Qn. The question of how a data analyst can use knowledge of Qn to improve 
inferences is left to future work. 

There are many ways to measure the information in Z. One way is through distribution func- 
tions. Let F denote the cumulative distribution function (cdf) on X corresponding to P. Thus 
F{x) = P{X G (— oo,a;i] x ■ ■ ■ x {—oo,Xr]) where x = {xi, . . . ,Xr). Let F = Fx denote the 
empirical distribution function corresponding to X and similarly let Fz denote the empirical distri- 
bution function corresponding to Z. Let p denote any distance measure on distribution functions. 

Definition 3.1. Qn is consistent with respect to p if p{F, Fz) 0. Qn is en-informative if 

p{F,Fz) = Op{tn). 



An alternative to requiring p(F, Fz) to be small is to re quire p(F, F z) to b e small. Or one could 



Blum et al 



(l2008h . These requirements 



require Qn{p{F, Fz) > e\X = x) be small for all x as in 
are similar. Indeed, suppose p satisfies the triangle inequality and that F is consistent in the p 
distance, that is, p{F, F) 0. Assume further that p(F, F) = Op{en). Then p(F, Fz) = Op{en) 
implies that 

piF, Fz) < p(F, F) + p(F, Fz) = Op(e„); 

Similarly, p(F, Fz) = Op(e„) implies that p(F, Fz) = Op(e„). 

Let Ep Q^ denote the expectation under the joint distribution defined by P" and Qn- Sometimes 
we write E when there is no ambiguity. Similarly, we use P to denote the marginal probability 
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under P" and Qn- P(A) = rfQ„(2i, . . . , . . . , x„)c?P(xi) ■ ■ ■ rfP(x„) for A G X''. 

There are many possible choices for p. We shall mainly focus on the Kolmogorov-Smimov 
(KS) distance p(P, G) = sup^. \F{x) — G{x)\ and the squared L2 distance p(P, G) = J {f{x) — 
g{x)Ydx where / = dF/dfx and g = dG/dfi. However, our results can be carried over to other 
distances as well. 

Before proceeding let us note that we will need some assumptions on F otherwise we cannot 



have a consistent scheme as shown in the fol 
re-expression of a result in 



Blum et al. 



owing theorem. The following result — essentially a 



( 20081) in our framework — makes this clear. 



Theorem 3.2. Suppose that Qn satisfies dijferential privacy and that p{F,G) = sup^ \F{x) — 
G{x) \. Let F be a point mass distribution. Thus F{y) = I{y > x) for some point x G [0, 1]. Then 
Fz is inconsistent, that is, there is a 6 > such that liminf„^oo P^{p{F, Fz) > 6) > 0. 



4 Sampling From a Histogram 

The goal of this section is to give two concrete, simple data release methods that achieve dif- 
ferential privacy. The idea is to draw a random sample from histogram. The first scheme draws 
observations from a smoothed histogram. The second scheme draws observations from a randomly 
perturbed histogram. We use the histogram for its familiarity and simplicity and because it is used 
in applications of differential privacy. We will see that the histogram has to be carefully con- 
structed to ensure differential privacy. We then compare the two schemes by studying the accuracy 
of the inferences from the released data. We will see that the accuracy depends both on how the 
histogram is constructed and on what measure of accuracy we use. 
Let L > be a constant and suppose that p = dP/dp G V where 

P=|p: < - (4) 

is the class of Lipschitz functions. We assume throughout this section that p E V. The minima x 



rate of convergence for density estimators in squared L2 distance for V is n ^/ ^'^'^^^ (|Scott , 



1992) 
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Let h = hnhe a binwidth such that < h < 1 and such that m = l/Zi'' is an integer. Partition 
X into m bins {-Bi, . . . , Bm} where each bin Bj is a cube with sides of length h. Let /(■) denote 
the indicator function. Let denote the corresponding histogram estimator on X, namely, 

m ^ 

where pj = Cj/n and Cj = Xlj-Li H-^i ^ ^j) the number of observations in Bj. Recall that fm 
is a consistent estimator of p if h = hn ^ and n/ij^ oo. Also, the optimal choic e of m = rri n 



19921). 



for L2 error under V is m„ x n''/*^^"'"''^ in which case j {p — fmf = Op{n 2/(2+0) ( Scott . 
Here, a„ x 6„ means that both a„/6„ and 6„/a„ are bounded for large n. 

4.1 Sampling from a Smoothed Histogram 

The first method for generating released data Z from a histogram while achieving differential 
privacy proceeds as follows. Recall that the sample space is [0,1]''. Fix a constant < 5 < 1 and 
define the smoothed histogram 

fmA^) = {l-S)fm{x)+6. (5) 

Theorem 4.1. Let Z = {Zi, . . . , Z^) where Zi, . . . , Z^ are k iid draws from fm,5{x). If 

k\og(^-^^ + l]<a (6) 



n6 



then a-differential privacy holds. 



Note that for 5 ^ and ^ ^ 0, log (^i^ + l) = ^(1 + o(l)) ^ ^. Thus ® is 
approximately the same as requiring 

mk 

— — < na. (7) 



Equation (|7]) shows an interesting tradeoff between m, k and 5. We note that sampling from 
the usual histogram corresponding to 5 = does not preserve differential privacy. 
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Now we consider how to choose m, k, S to minimize E(p(F, Fz)) while satisfying Q. Here, E 
is the expectation under the randomness due to sampling from P and due to the privacy mechanism 
Qn- Thus, for any measurable function h, 

E{h{Z)) = J J K^l, ■ ■ .,Zk)dQn{zi, . . .,Zk\Xi, . . .,Xn)dP{xi) ■ ■■dP{xn). 

Now we give a result that shows how accurate the inferences are in the KS distance using the 
smoothed histogram sampling scheme. 

Theorem 4.2. Suppose that Zi, . . . , are drawn as described in the previous theorem. Sup- 
pose dH) holds. Let p be the KS distance. Then choosing m x rf/^^^'^\ k x m^/'' = n^/(^+'^'> and 
6 = {mk/na) minimizes Ep(F, Fz) subject to I©. In this case, Ep(F, Fz) = O ^ ^2/(6+") 

In this case we see that we have consistency since p{F, Fz) = op{l) but the rate is slower than 
the minimax rate of convergence for density estimators in KS distance, which is n^^/^. Now let 
qj = i^{Zi e B,}/k and 



/m 
(Pix) - fzix))^dx, where fzix) = h-''^qjl{x E Bj). 
.7=1 



(8) 



Theorem 4.3. Assume the conditions of the previous theorem. Let p be the squared L2 distance as 
defined in (H)). Then choosing 

^^^r/(2.+3)^ A;Xn(^+2)/(2r+3)^ 5x^-1/(^+3) 

minimizes Ep(F, Fz) subject to ©. In this case, Ep(F, Fz) = 0(r7,-2/(^''+3)). 

Again, w e have consistency but the rate is slower than the minimax rate which is n-^/^^"*"^^. 
Jscottl . ll992l) 
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4.2 Sampling From a Perturbed Histogram 

The second method, which we call the sampling from a perturbed histogram, is due to Dwork et. 
al. (2006). Recall that Cj is the number of observations in bin Bj. Let Dj = Cj + Uj where 
z^i , . . . , z/m are independent, identically distributed draws from a Laplace density with mean and 
variance 8/q;^. Thus the density of Uj is g{u) = (Q;/4)e~l^l"/^. Dwork et. al. (2006) show that 
releasing D = (Di, . . . , Dm) preserves differential privacy. However, our goal is to release a 
database Z = (Zi, . . . , Z^) rather than just a set of counts. Now define 



Dj = max{Dj, 0} and qj = Dj / D^ 



Since D preserves differential privacy, it follows from Lemma [Z6l that (gi, . . . , g^) also preserve 
differential privacy; Moreover, any sample Z = (Zi, . . . , Z^) from /(x) = X^jLi QjH^ ^ Bj) 
preserve differential privacy for any k. 

Theorem 4.4. Let Z = {Zi, . . . , Z^.) be drawn from f{x) = h^"' Xljli ^-^(^ ^ Bj). Assume that 
there exists a constant 1 < C < oo such that sup^ p{x) = C. 

(1) Let p be the L2 distance and fz be as defined in (IS]). Let m x n^^ and let k > n. Then we 
have Ep{F, Fz) = 0(^-2/(2+0). 

(2) Let p be the KS distance. Let m x nr'^'^+''\ Then Ep(F, Fz) = O (mm \f^^ ) • 

Hence, this method achieves the minimax rate of convergence in L2 while the first data release 
method does not. This suggests that the perturbation method is preferable for the L2 distance. 
The perturbation method does not achieve the minimax rate of convergence in KS distance; in 
fact, the exponential mechanism based method achieves a better rate as we shown in Section |5] 
(Theorem 15. 41 ). We examine this method nume rically in Section [71 



Another approach to histograms is given by lMachanavajjhala et al. 



(l2008h . They put a Dirichlet 



(oi, . . . , am) prior on the cell probabilities pi, . . . ,pm where pj = P(Xj G Bj). The corresponding 
posterior is Dirichlet (oi + Ci, . . . , + Cm)- Next they draw q = (gi, . . . , g^) from the posterior 
and finally they sample new cell counts D = (Di, . . . , Dm) from a Multinomial (A;, g). Thus, the 
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distribution of D given X is 

¥{D = d\X) 



T{k + n + Z,a,) 



They show that differential privacy requires aj + Cj > k/{e°' — 1) for all j. If we take 
ai = a2 = ■ ■ ■ = ttm then this is similar to the first histogram-based data release method we 
discussed in this section. They also suggest a weakened version of differential privacy. 



5 Exponential Mechanism 

In this section we will consider the exponential mechanism in some detail. We'll derive some gen- 
eral results about accuracy and apply the method to the mean, and to density estimation. Specifi- 
cally, we will show the following for exponential mechanisms: 

1. Choosing the size k of the released database is delicate. Taking k too large compromises 
privacy. Taking k too small compromises accuracy. 

2. The accuracy of the exponential scheme can be bounded by a simple formula. This formula 
has a term that measures how likely it is for a distribution based on sample size k, to be in 
a small ball around the true distribution. In probability theory, this is known as a small ball 
probability. 

3. The formula can be applied to several examples such as the KS distance, the mean, and 
nonparametric density estimation using orthogonal series. In each case we can use our results 
to choose k and to find the rate of convergence of an estimator based on the sanitized data. 

In light of Theorem 13.21 we know that some assumptions are needed on P. We shall assume 
throughout this section that P has a bounded density p; note that this is a weaker condition than 
©. 
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Recall the exponential mechanism. We draw the vector Z = (Zi, . . . , Zf^) from h{z\x) where 
h[z\x) = -J. , where = exp — and (9) 



'[0,1] 

A = An,k = sup snp \p{F^, F^) - p{Fy, F^)\. 

5{x,y)=l 



n,k 



Lemma 5.1. For KS distance An^k < ^• 



This framework is used in Blum et al. (2008). For the rest of this section, assume that Z = 
(Zi, . . . , Zk) are drawn from an exponential mechanism Qn- 

Definition 5.2. Let F denote the cumulative distribution function on X corresponding to P. Let 
G denote the empirical cdffrom a sample of size kfrom P, and let 

Rik,e) = P\piF,G)<e). 

R{k, e) is called the small ball probability associated with p. 

The following theorem bounds the accuracy of the estimator from the sanitized data by a simple 
formula involving the small ball probability. 

Theorem 5.3. Assume that P has a bounded density p, and that there exists — > such that 

F(piF,Fx)>^)=o(^] (10) 



16/ \n 

for some c > 1. Further suppose that p satisfies the triangle inequality. Let Z = (Zi, . . . , Zk) be 
drawn from gx{z) given in Then, 

Thus, if we can choose k = kn 'vn such a way that the right hand side of (fTT)) goes to 0, then the 
mechanism is consistent. We now show some examples that satisfy these conditions and we show 



how to choose k^. 
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5.1 The KS Distance 



Theorem 5.4. Suppose that P has a bounded density p and let B := logsup^p(x) > 0. Let 
Z = {Zi, . . . , Zk) be drawn from gx{z) given in (19]) with p being the KS distance. By requiring 
that kn X {^Y^^ r?!'^, we have for e„ = 2 (^) n"^/^, and for p being the KS distance, 

p{F,Fz)=Op{en). (12) 

Note that p{F, Fz) converges to at a slower rate than p(F, Fx)- We thus see that the rate after 
sanitization is n^^^^ which is slower than the optimal rate of n~^/'^. It is an open question whether 
this rate can be improved. 



5.2 The Mean 

It is interesting to consider what happens when p{F,Fz) = \\p. — Z]]"^ where p. = j xdP{x) 
and Z is the sample mean of Z. In this case A < r/n. Thus, h{u\x) ~ e""""^"^"^/'^^") so, 
approximately, Zi, . . . , Z^. ~ A^(-^, ka/n). Indeed, it suffices to take A; = 1 in this case since then 
Z = X + Op(l/v^). Thus Z converges at the same rate as X. This is not surprising: preserving 
a single piece of information requires a database of size k = 1. 



6 Orthogonal Series Density Estimation 

In this section, we develop an exponential scheme based on density estimation and we compare it 
to the perturbation approach. For simplicity we take r = 1. Let {1, ^/^i, ?/^25 • • • , } be an orthonormal 
basis for ^2(0, 1) = {/ : p{x)dx < 00} and assume that p G -^^2(0, 1). Hence 

p{x) = 1 + ''^Pjipj{x) where Pj = / ipj{x)p{x)dx. 
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We assume that the basis functions are uniformly bounded so that 



Co = sup sup < oo. 



J a; 



Let i3(7, C) denote the Sobolev ellipsoid 



where 7 > 1/2. Let 



(13) 



P(7,C) = \ p{x) = 1 + : P e i3(7,C) \. 



The minimax rate of convergence in L2 norm for 7^(7, C) is n 27/(27+1) (|Efromovichl 



1999 ). Thus 



inf sup E 



for some ci > 0. This rate is achieved by the estimator 



p{x) = l + '^pjipjix) 



(14) 



where m„ = n^/^^i+i) ^nd = n'^ ^"^^ ^i(^i)- See Efromovichld 19991) . 

For a function m G i^2(0, 1), let us define = f |M(x)p(ix] , which is a norm on 

L2(0, 1). Now consider an exponential mechanism based on 



(x) — (ix 



1/2 



I ^ ^ II 

\P-P h. 



where 

k 



(15) 



p*ix) = l + for mfc = A;^ and /?* = A;"^ ^ V^,(Zi). (16) 
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Lemma 6.1. Under the above scheme we have A < f^^^for Co as defined in (fT3l) . Hence, 

q{z\x) = exp < exp 7^ almost surely. (17) 

\ A J \ 2c^mn J 

Theorem 6.2. Let Z = {Zi, . . . , Z^) be drawn from gx{z) given in rfTTI) . Assume that 7 > 1. If we 
choose k x -^n then 

p'^{p,P*) = Op (n"2^ 



We conclude that the sanitized estimator converges at a slower rate than the minimax rate. Now 
we compare this to the perturbation approach. Let Z = {Zi, . . . , Zk) be an iid sample from 

q{x) = l + Y.$, + Pj)^j{x) 
i=i 

where i/i, . . . , i/^ are iid draws from a Laplace distribution with density = [na/ {2cQ'm))e~^°'^^^/^^°'^\ 
Thus, i the notation of 12. 6[ R = (z/i, . . . , i/^)- It follows from Lemma 12.61 that, for any k, 
this preserves differential pri vacy. If q'jx) < for a ny x then we replace g by q{x)I{q{x) > 



0)/ / q{s)I{q{s) > 0)ds as in .Hall and MurisonI (Il993h 



Theorem 6.3. Let Z = {Zi, . . . , Z/.) be drawn from q. Assume that 7 > 1. 7/' we choose k > n, 
then 

p\p,Pz) = Op (n'^^ 
where pz is the orthogonal series density estimator based on Z. 

Hence, again, the perturbation technique achieves the minimax rate of convergence and so 
appears to be superior to the exponential mechanism. We do not know if this is because the 
exponential mechanism is inherently less accurate, or if our bounds for the exponential mechanism 
are not tight enough. 
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Figure 1: Top two plots n = 100. Bottom two plots n = 1,000. Each plot shows the mean 
integrated squared error of the histogram. The lower line is from the histogram based on the 
original data. The upper line is based on the perturbed histogram. 

7 Example 

Here we consider a small simulation study to see the effect of perturbation on accuracy. We focus 
on the histogram perturbation method with r = 1. We take the true density of X to be a Beta(10,10) 
density. We considered sample sizes n = 100 and n = 1,000 and privacy levels a = 0.1, and 
a = 0.01. We take p to be squared error distance. Figure [T] shows the results of 1,000 simulations 
for various numbers of bins m. 

As expected, smaller values of a induce a larger information loss which manifests itself as a 
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larger mean squared error. Despite the fact that the perturbed histogram achieves the minimax rate, 
the error is substantially inflated by the perturbation. This means that the constants in the risk are 
important, not just the rate. Also, the risk of the sanitized histograms is much more sensitive to the 
choice of the number of cells than the original histogram is. 

We repeated the simulations with a bimodal density, namely, p{x) being an equal mixture of 
a Beta(10,3) density and Beta(3,10) density. The results turned out to be nearly identical to those 
above. 

8 Conclusion 

Differential privacy is an important type of privacy guarantee when releasing data. Our goal has 
been to present the idea in statistical language and then to show that loss functions based on distri- 
butions and densities can be useful for comparing privacy mechanisms. 

We have seen that sampling from a histogram leads to differential privacy as long as either 
the histogram is shifted away from by a factor S or if the cells are perturbed appropriately. The 
latter method achieves a faster rate of convergence in L2 distance. But, the simulation showed 
that the risk can nonetheless be quite large. This suggests that more work is needed to get precise 
finite sample risk bounds. Also, the choice of the smoothing parameter (number of cells in the 
histogram) has a larger effect on the sanitized histogram than on the original histogram. 

We also studied the exponential mechanism. Here we derived a formula for assessing the 
accuracy of the method. The formula involves small ball probabilities. As far as we know, the 
connection between differential privacy and small ball probabilities has not been observed before. 

Minimaxity is desirable for any statistical procedure. We have seen that in some cases the 
minimax rate is achieved and in some cases it is not. We do not yet have a complete minimax 
theory for differential privacy and this is the focus of our current work. We close with some open 
questions. 

1. When is it possible for p(F, Fz) to have the same rate as p(F, Fx)? 

2. When adaptive minimax methods are used, such as adapting to 7 in Section[6]or when using 
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wavelet estimation methods, is some form of adaptivity preserved after sanitization? 

3. Many statistical methods involve some sort of risk minimization. A example is choosing a 
bandwidth by cross-validation. What is the effect of sanitization on these procedures? 

4. Are there other, better methods of sanitization that preserve differential privacy? 

9 Proofs 

9.1 Proof of Theorem Q 

Without loss of generality take i = 1. Let Mq(B) = J Q(B\s, X2, ■ ■ ■ , Xn) dP{x2, ■ ■ ■ , Xn) 
and Mi{B) = J Q(B\t, X2, ■ ■ ■ , Xn)dP(x2, . . . , x„). By the Neyman-Pearson lemma, the highest 
power test is to reject Hq when U > u where U{z) = {dMi/ dMQ){z) and u is chosen so that 
/ I{U{z) > u)dMoiz) < 7. Since (s , X2, . . . , Xn) and (t, X2, . • . , Xn) differ in only one coordinate. 
Ml (5) < e"Mo(5) and so the power is Mi{U > u) < e"Mo{U > u) < 76". □ 



9.2 Proof of Lemma 12^ 

For the first part simply note that P(/i(T(X,i?)) G B\X = x) = P(T(X, i?) G h-\B)\X = x) < 
e"¥{T{X,R) G h-\B)\X = x') = e"P(/i(r(X, G B\X = x'). 

For the second part, let Z = {Zi, . . . , Zk) and note that Z is independent of X given T{X, R). 
Let H be the distribution of T(X, R). Hence, 



F{Z G B\X = x) 




F{Z G B\X = x,T = t)dH{t\X = x)dt 




j F{Z G B\T = t) 




dH{t\X = x') 




e"P(Z G B\X = x'). 



20 



9.3 Proof of Theorem 



3.2 



Blum et al. 



(20081). Let r = 1 



Our proof is adapted from an argument given in Theorem 5.1. of|j 
so that X = [0, 1]. Let P = Sq where 5o denotes a point mass at 0. Then P"(X = X(q)) = 1 where 
X(o) = {0, . . . , 0}. Assume that Qn is consistent. Since F(0) = 1, it follows that for any 5 > 0, 
F{Fz{0) > 1-6) ^1. But since P(-) = EpQn{-\X) and since P"(X = X(o)) = 1, this implies 
that Qn{Fz{0) >1-6\X = X(o)) ^ 1. 

Let i; > be any point in [0, 1] such that QniZ = v\X = X(o)) = 0. Let X(i) = {f , 0, . . . , 0}, 
X(2) = {v,v,0,...,0}, ...,X(„) = {v,v,...,v}. By assumption, Qn{Z = = X(o)) = 

for all j > 1. Differential privacy implies that QniZ = X(j)|X = X(i)) = for all j > 1. 
Applying differential privacy again implies that Qn{Z = X(^j)\X = X(2)) = for all j > 1. 
Continuing this way, we conclude that QniZ = \X = = for all j > 1. 

Next let P = 6y. Arguing as before, we know that Qn{,Fz{v) < 1 — 5\X = X(„)) — > 0. 
And since F{v-) = we also have that Qn{.Fz{v-) > 5\X = X(„)) 0. Here, F{v-) = 
limj^oo -^(^^i) where f i < f 2 < . . . and Vi v. Hence, for j'/n > 1 — 5, Qn{Z = X(j)|X = 
X(„)) > which is a contradiction. □ 



9.4 Proof of Theorem 4.1 



Suppose that X differs from Y in at most one observation. Let / denote the perturbed histogram 
based on X and let gm,5 denote the histogram based on Y, such that X and Y differ in one 
entry. We also use Pj{X) and PjiY) for cell proportions. Note that \pj{X) — Pj(Y)\ < 1/n by 
definition. It is clear that the maximum density ratio for a single draw Xj, or all i, occurs in one bin 
Bj. Now consider x = (xi, . . . , Xi) such that for alH = 1, . . . , /c, we have Xi G Bj C [0, l]*" and 
the following bounds. 

1. Let Pj{Y) = 0; then in order to maximize /(x) /^(x), we let Pj{X) = 1/n and obtain 
/W =TtM^< ( {^-S)Hyn) + 6 Y _ ni-6)m ^ \\ 

?W t-\9mA^i) ~ \ ^ J \ / ' 
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2. Otherwise, we let (F) > 1/n, (as by definition of Pj, it takes for non-negative integers 
z) and let Pj{X) = Pj{Y) ±l/n. Now it is clear that in order to maximize the density ratio 
at X, we may need to reverse the role of X and Y, 

/(x)'?(x)J - [[{l-S)m{p,-{l/n))+6) \ {1 ~ 6)mp, + 6 

{l-6)m{l/n) 




il-6)m{pj-{l/n)) + 5 
6)m 



+ 1 



where the maximum is achieved when PjiY) = 1/n and Pj{X) = 0, given a fixed set of 
parameters m, n, 5. 

Thus we have 



xe([o,ir,...,[o,i]'-) fi'W 

and the theorem holds. □ 



nS 



9.5 Proof of Theorem 4.2 



Recall that Fz denotes the empirical distribution function corresponding to Z = {Zi, . . . , Z^), 
where Zi G [0, 1]^ for all i are i.i.d. draws from density function fm,s{x) as in dS]) given X = 
(Xi , . . . , Xn). Let U denote the uniform cdf on [0, 1] Given X = (Xi , . . . , X„) drawn from a dis- 
tribution whose cdf is F, let fm denote the histogram estimator on X and let Fm{x) = fm{s)ds 
and F^,s{x) = (1 - 6)Fm{x) + 6U{x). Define F„(a;) = E(F„(a;)) and = E(/„(x)). 

The Vapnik-Chervonenkis dimension of the class of sets of the form {(— 00,2:1] x ■■■ x 
(—00, Xr] is r and so by the standard Vapnik-Chervonenkis bound, we have for e > that 



P sup \Fx{t) - F{t)\ > e < 8n''exp { ^ < exp <^ } (18) 



t£[o,iY 



2 ^ ^9 

ne nt 



32 - M 64 



for large n. Hence, Esup^gp,!]- Wx{t) - F{t)\ = O (^^^^^ • Given X, we have Zi,...,Zk 
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and so Esup[o,i]. \Fz{t) - = O (^^^^ . Thus, 



E sup 



Fz{x) - F{x) 



< Esup \Fz{x) - +Esup \F^,six) - F{x)\ 

X X 

< Esup|F^(s) +Esup|F„(x) +5 

X X 

< Esup|F^(x) +Esup|F„(x) -F(a;)| +5 



O I ^jL^ ) +Esup|F„(a;)-F(x)|+5. 



By the triangle inequality, we have for all x G [0,1]^, 



Fm{x) - F{x) 



< 



+ |F„(a;) - F{x) 



and hence 



E sup 

x<^[o,iY 



F^{x) - F{x) 



< E sup 

x6[o,ir 



\X] F'lYi, ( X , 



+ E sup |F„(a;) - 
xe[o,i]'- 



r logn 



n 



+ E sup \F^{x)-F{x) 

x&[0,l]^ 



(19) 



where the last step follows from the VC bound as in (fT8l) for F„i{x). 

Next we bound sup^^jQ j^jr \Fm{x) - F{x)\. Now F{x) = P{A) where A = {{si, . . . : 
Sj < Xj, i = 1, . . . , r}. If X = {jih, . . . , jrh) for some integers ji, . . . , jr then F{x) — Fm{x) = 0. 
For X not of this form, let x = (ji/i, . . . , jr^) where ji = [xi/h\ . Let R = {(si, . . . , s,.) : Si < 
Xi,i = l,.. . ,r}. So 



F(x)-F^(x) = P{A)~PUA) = P{R)-PUR) + PiA\R)-Pm{A\R) 

= PiA\R)-PUA\R) (20) 

where Pm{B) = J^dFm{u) and the set A \ i? intersects at most rh/h^ number of cubes in 
{Bi, Bm}, given that Vol(A \R) <1-{1- h)' < rh. Now by the Lipschitz condition 
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wehavesup^gp,!]'- b(^) ~ < Lhy/r and 

\P{A\R)-PUA\R)\ 

< number of cubes intersecting (A \R) x maximum density discrepancy x volume of cube 

< (r/i//i") ■ (L/iv^) ■ /i" < Lr^/^m-^/''. (21) 



Thus we have by UB, W and Ul 



Esup iF^x) - Fix)\ = O (J^-^^ + Lr'^'m-^^'. 



(22) 



Hence, 



Esup\Fzix) -F(x)| =0 



n 



Set m X n'"/'^^"'"^^ /c x m"^/^ = n'^/(^+'") and 5 = [mk/na] we get for all n large enough, 
EsupjF^(x)-F(x)|=0(^^). □ 



9.6 Proof of Theorem 4.3 



Let fz be the histogram based on Z as in ([8]). Then 



(/zH - p{u)Y ^ (1 - 5)^(pH - /^(«))^ + 5\p{u) - ly + UmAu) - fz) 



where ^ means less than, up to constants. Hence, 



E / {fz{u) - p{u)fdu ^Rm + S'' + E ifmAu) - fz{u)fdu 
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where Rm is the usual L2 risk of a histogram under the Lipschitz condition (Hj), namely, m ^Z'" + 
m/n. Conditional on X, fz is an unbiased estimate of with integrated variance m/k. So, 



E / ifziu) - ^ m-^/'^ + + 52 + ™ 

y n k 

Minimizing this, subject to Q yields 

which yields E / {fz{u) - v{u)fdu = 0(n-2/(2r+3))^ □ 



9.7 Proof of Theorem 4.4 



(1) Note that p- fz = P~ f + f - fz = p- f + 0p [^) . When k > n, the latter error is lower 
order than the other terms and may be ignored. Now, 

p{x) - f{x) = p{x) - fr„{x) + Jrr,{x) - f{x). 

Thus 

{p{x) - f{x)fdx ^ / {p{x) - fm{x)fdx + / ifmix) - J{x)fdx. 



The expected value of the first term is the usual risk, namely, 0(m ^Z*" + m/n). 
For the second term, we proceed as follows. Let pj = Cj/n and 

We claim that 

max loj — Pj\ = O \ 

j \ n 
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almost surely, for all large n. We have 



n 



n 



n Rr, 



where = iY.7=ii'^s + J^s)+)/n. Now 



n 



n 



Therefore, 



n 



Pi 



~ n ~ n 



where M = max{|z/i|, . . . , Let >1 > 0. The density for Vj has the form j{y) = (/?/2)e"^l''l. 

So, 

/■oo 

P(M > ^logm) < wF{\vj\ > ^logm) = f3m e'^^'^Uv = 

J A loe m 



1 



' A log m ^ 

By choosing A large enough we have that M < Alogm a.s. for large n, by the Borel-Cantelli 
lemma. Therefore, 



n 



Pj 



< 



logm 



n 



Now we bound We have 



n ~ n ~ n ~ n 



so that 



|i?n - 1| < 



y\Ays\ ^ Mm ^ f mlogm^ 
— < = O 1 a.s. 



n n 

Therefore, = (1 + O {m log m/n)) and thus 



n 



Qj = [Pj + O 



logm 



n 



(-( 



m logm 



n 



- , - ^/mlogm\ /logm 
Pj +PjO \ \ + ' 



n 



n 



O 



ni (log in] 
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Next we claim that pj = 0(l/m) a.s. To see this, note that pj < C/m, by definition of C: 
1 < C = sup^p(a;) < oo. Hence, by Bernstein's inequality, 

2C\ ^/^ 2C \ f 1 n((2C/m) 



m) V ' ' V " I 2p,. + \{{2C/m) - p,] 

[ l nC'/m' \ _ 3nC/(8,n) < 1 

- 2(4C/3m)/"^ -V? 

for all > 16mlogn/3C; Thus = 0(l/m) a.s. for all large n. Thus, — pj = 0{\ogm/n) 
almost surely for all large n. Hence, 



E / {Ux) - f{x)Ydx = O 



n 



So the risk is 



0\m ^/"^ H h 



for n > mlog^m. This is the usual risk. Hence, we can choose m x rf/'^'^+^) to achieve risk 
j^-2/(2+r) ^Qj. ^ large enough. 

(2) Let Fm be the cdf based on the original histogram and let Fm be the cdf based on the 
perturbed histogram. We have 

Esup|F(x) < Esup|F(x) +Esup|F^(x) +Esup|F^(x) -Fz 

X XXX 

< Esup\F{x)-Fm{x)\+Esup\Fm{x)-Fm{x)\+0 ^^^-^ 
Since we may take k as large as we like, we can make the last term arbitrarily small. From (|22l) . 

Esup - FUx)\ = O ^y^^j + Lr^^'m-^^^ 

Let f{x) = h-^Y.T=iPj^(^ ^ and Let /(x) = h-"- Y^Jl^qjI{x G 5^). Let x' = 
(■Ui/i, . . . , Urh) where Ui = \xi/h~\,\/i = 1, . . . , r. Recall that Si, ... , Bm are the m bins of 
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X with sides of length of h. Let 5^ denote the cube with the left-most comer being and the 
right-most comer being x. Then for all x, we have 



f{s)-f{s)ds 







< 



f{s) - f{s] 



ds 



< I f{s)-f{s)ds 



•■■BpCB, 



where we use the fact that there are at most m cubes. Hence, 



E sup \E^{x) - F^{x)\ < ^ 

xe[o,i]'' ''^ 



where we use the fact that max^ \pj — qj\ = 0{\ogm/n) a.s. So, 



Esup \F{x) - Fzix)\ = O 



n 



^ \ + Lr^/^m-/^ + O ( 

n 



Setting m x rf/ yields 



Esup \F{x) - Fz{x) \ = O min 



log n / log n 



Hence for r = 1, the rate is O {^^-^^p^ ■ For > 2, the rate is dominated by the first term inside 
0(), and hence the rate is O (\ogn x n"^/(^+^')) . □ 



9.8 Proof of Theorem 5.3 



Let = |w = . . . jtifc) : F„) < e| where F„ is the empirical distribution based 

on M = . . . ,Uk) G X^. Also, let A„ = {p{Fx-,F) < e„/16}. For notational simplicity set 
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A = A„,fc. Then 



P (p(F, Fz) > e„) = P Fz) > e,, A„) + P (p(F, F^) > e„, 
< p(p(F,Fz)>e„,^„)+P(A:;) 

= p(p(F,Fz)>e„,A„)+0(^^^. 

By the triangle inequality Fx) > p{K, F) - p{Fx, F). Then, 

-ap{Fx,Fu) 



/ gx{u)du = / exp 



2A 



2A / ./fic \ 2A 



< exp I '^^^ — ^ I exp ( — ^ 1 / (iti 



2A / " V 2A 



ap{Fx, F) \ f-ae 
< exp — ^— exp ' 



2A / " V 2A 



By the triangle inequality, we also have p(F„, Fx) < p(F„, F) + p(Fx, F) and 



gx{u)du > I gx{u)du= I exp I '^^(^'^") [ 

Be/2 •^Be/2 ^ 

> expl^^^#^^ / expfz^^^ld. 



2A /./^^^^ M 2A 
> exp|^^#^'|expfz^^) / du 

-Be/2 



2A / V 4A 



I -2ap(Fx,F) -ae \ f p{ui) ■ ■ 'PM 



4^ J JB,,^p{ui)---p{uk) 



exp(:iHM^^^ 

> ^P 

{snY>xP{x)Y 



p{F, G) < e/2) 
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where G is the empirical cdf from a sample of size k drawn from P. Thus we have 



h{u\x)du < 



(sup,.p(x))^exp (M^) exp (^) 



V 

pfp(F,G) <e/2 



Thus, from (|23l) . 



V 16/ pU(F,G)<e/2 

(sup,p(x))^exp (^) ^ ^ / 1 
P G) < 



Thus the theorem holds. □ 



9.9 Proof of Lemma 5.1 



Proof of Lemma 1121 We start with KS, By the triangle inequality, we have for all 2; G X'' and 

for all x, y G rY", 



p{F„F,)-p{Fy,F,) < p{F„Fy). 



Notice that changing one entry in x will change -Fx(^) by at most - at any t by definition, that is. 



sup \F,{t) - Fy{t)\ = -. 
<G[o,i]'- n 

Thus the conclusion holds for the KS-distance. □ 



9.10 Proof of Theorem 5.4 



We need the following small ball result; see lLi and Shad (1200 



Theorem 9.1. Let r > 3, and {Xf, t G [0, l]*"} be the Brownian sheet. Then there exists < < 



30 



oo such that for all < e < 1, 

logP ( sup \Xt\ < e ) > -ae~^log^""^(l/e) 

where depends only on r. The same bound holds for a Brownian bridge. 

Proof of theorem |5.4[ The Vapnik-Chervonenkis dimension of the class of sets of the 
form {(— oo,a;i] x ■ ■ ■ x (—00,2;^] is r and so by the standard Vapnik-Chervonenkis bound, we 
have for e„, A;„ as specified in the theorem statement, 



P sup \Fx{t) - F{t)\ > < Sn^exp 



[0,1]'- 



16 
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< 



8 exp < — C5 I — ) n^^^ + r logn 

8 exp |-C6\/fc^ ^ cjrlogkr 

8 exp j-Csv^ 



(24) 



for some constants C5, ce, C7, C2 > for n large enough. Thus (flOl) holds. Now we compute the 
small ball prob abilitv- Note that Vk(Fk — F) converges to a Brownian bridge 5^ on [0, l]**. More 



precisely, from 



Csorgo and Revesa (| 19751) there exist a sequence of Brownian bridges Bk such that 



sn^\Vk{Fk-F){t)-Bk{t)\=o(^ 



(logA;)3/2\ 



a.s. 



(25) 



where 7 = 1/ (2(r + 1)). It is clear that the RHS of (|25|) is o(l) a.s. given a fixed r. Hence we have 
for k = kn and e„ as chosen in the theorem statement, and for all e > e^, it holds that 



logP(sup \Fz{t) - F{t)\ < e/2) = logP(sup Vk\Fz{t) - F{t)\ < Vke/2) 

t t 

> logP ^sup \Bk{t)\ <Vke-0 {k~^{\ogkf/^)^ (26) 

Vke 



> logP I sup|5fc(t)| < ^ 



(27) 
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for all large n, where (|26l) follows from (|25] ) and (|27] ) holds given that y^e > \/k^en > c for some 
constant c > 1/2 due to our choice of A;„ and e„. Also, A < 1/n for KS distance. Hence, by 
Theorem [53] and (l24l) . we have for i? = logsup^. p(a;) > 0, 



< Co exp <^ -n — — ^ + 8 exp <^ -C2— — 

< CoeM-CzBK/2) + 8exp j-Ca (28) 

for some constants Co, Ci, C2 and C3, where (l28l) holds when we take w.l.o.g. kn = jq (^) '^^^^ 
and e„ > 2 j^^^^-^/^, given that e„ > 2 j^^^^^'^^^ = and hence ^ > Thus 

the result follows. □ 

Remark 9.2. The constants taken in the proof are arbitrary; indeed, when we take kn = C4 (^) ^^'^ ^^^^ 
and e„ = 32C4 (^) n~^/^ with some constant C4 > 1/16, (|28] ) w/// /zoW wiY/i slightly dijferent 
constants C2, C3. For /;;„ an J e„ as chosen above, it holds that y/k^en x 1. 



9.11 Proofs for Lemma 6.1 and Theorem 6.2 



Throughout this section, we let px denote the estimator as defined in (fT4)) . which is based on a 
sample of size n drawn independently from F; Similarly, we let pk denote the same estimator 
based on an i.i.d. sample (Fi, . . . , Y^) of size k drawn from F, with = A;i/(27+i) replacing m„ 
and l3j = k^^ Yli=i (fT4)) . We let pz denote the estimator as in (fT6l) . based on an i.i.d. 

sample Z = (Zi, . . . , Zk) of size k drawn from gx{z) as in (flTI) . 

Proof of Lemma 16.11 Without loss of generality, let X = (x,X2, . . . and Y = 
{y, X2, . . . , Xn) so that 6{X, Y) = 1 and let Z e XK Recall that 

1/2 

2 



= {Px{x) -pz{x)) dx 



1/2 

2 
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In particular, let us define u = px ~ Pz and v = py — Pz and thus 



\aX,Z)-aY,Z)\ 



2 ff 2 ^^'^ 

{px{x) -pz{x)) dxj -( / {py{x) -pz{x)) dx 



\u\\i^ - ||i;||^J < \\u-v\ 



^X-PZ- {pY - Pz) 11^2 = \\PX - Pvh^ 



< 



n 



where the first inequality is due to the triangle inequality for the || . ||^ and the last step is due to 



\pxix) - Prix)] 



1 

n 

1 

n 



j=i \i=\ i=\ 



Y,{^,{X^)-i,,{Yr))^,{x) 



X] 



< 



x)\ < 



n 



Hence A < □ 

— n 



Proof of Theorem 16.21 For u = [ui, . . . ,Uk) G X'^, we let 



Pu{x) = 1 + ^pjijj{x), 

where ruk = fc^T+i and Pj = Yl!i=i i'ji'^d- 
Let Fu be the empirical distribution based on u. Our proof follows that of Theorem [531 with 



p{F,F^) = \\p-Pu\\i, and p{Fx,F^) = \\px - Pu\ 



^2 



as defined in ([T5]) for X = (Xi, . . . , X„). Now 



5e = |m = {ui,...,Uk) : \\p-Pu\\i^ < e}- 
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Thus the corresponding triangle inequalities that we use to replace that in Theorem [53] are: 



\\Pu-Px\\i^ > \\Pu-p\\e,-\\Px ~p\\e, 
\\Pu-px\\e^ < \\Pu-p\\e^ + \\p-px\\i,^- 

Standard risk calculations show that (flOl) holds for some c > with p(F, Fx) being replaced with 
Wpx — P\\e2- ^^^^ by Markov's inequality, 



x-p\\i, > e) < 



e2 



and (flOl ) follows from the polynomial decay of the mean squared error E| \px — p. Thus, from 
(|23l). for pz = p* as in (dH), 



(sup^p(a;))'=exp (^) _^ ^ 1 



P(lb-Pfc||,, <e/2) V^' 

We need to compute the small ball probability. Recall that % denote the estimator based on a 
sample of size k. By Parseval's relation, 

j=l ruk+l j=l 

Let t/i = {ipi^Xi) — f3i, ... ,ilJrnk{Xi) — Prrik)'^ and Yi = S^^^^t/j where Sfe is the covariance matrix 
of Ui. Hence, Yi has mean and identity covariance matrix. Let denote the largest eigenvalue 
of Efc. From Lemma [93] below. A = lim sup^^^o^ A^ < c>o. Let Q = Yl^=i(l^j ~ l^jY ^'^d 
S = A;"^/^ Yli=i ^i- Then, for all large k, and any 5 > 0, 

2\ _ lalcT^ Q / r r2N \ TO / cT c / ^'^^ A \ TO / cT - - ^'^^ 



P((5<5) = f{S'T.kS <k5')>^[S'S < — ]>^[S'S < 

Afc / V 2A 
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From Theorem 1 . 1 of lBentkusI (I2003|) we have that 



sup I P (5^5 < c) - 



<c) =0 



ml 



Next we use the fact (see iRohde and DuembgenI (|2008l) for example) that P(Xm < m + a) 

1 _ e-aV(4(m+a))_ ^ ^ ^^^-^^/(Z^+l) ^^^^^ y + 1){C^ + 1) 



A;(ej4-C2A;-2V(27+i)) 



2A 



mk, 



since mk = k'^-<+^ = n^/'^^'^'^+'^\ We see that for all large k 



^[\\P-Pkh,< 



Hence 



P ( / ipix)-pkix)fdx < -J 



0=1 

fc(e„/4-C2fc-2^/(27+i)) 



-27/(27+1) 



> 1 — exp 



2A 



4(mfc + a] 



> 1 _ o (A;-(^-i)/(27+i)) 



P(lb-pzll,J>v^) < p(|Ipx-p1I,,>^) + 



(sup^p(a;))^exp 



16A 



p(lb-P.||,, < Vi;/2) 



(sup^. p(x))'^ exp 



16A 



P(lb-Pfc||,, < v^/2) 
(sup,p(x))^exp 



< 



p(lb-Pfc||,, < V^/2) 
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and so for 7 > 1, 



{j{Vz-V?<en) < C2exp (^/clogsupp(a;)^ exp (^ ^y^^^li^^'J^^^^^^ 
= C2 exp ^n^/^ log supp(a;) — 



Q-Cg^V 2(27 + 1), 

= C2exp ^-ac4n(^ 2(27+1) )j _^ 0, 

as n 00 since ^^^^'^-^^ > 1/2, where C2, C3, C4 are some constants. Hence the theorem holds. □ 
Lemma 9.3. Let X = limsup^^^Q A^. Then A < 00. 

Proof. Recall that the orthonormal basis is ipo, ipi, ■ ■ ■ , where ipo = 1 and i^jix) = -\/2 cos(7rjx). 
Als o v(x) = 1 + y.^ . (3,^Ax) and E,- A'j'^ < ^- Note that J2T=i = 0(1) for k > 1; 



see ' ' ' " 



EfromovichI (|1999|) . Note that S^. is the covariance matrix of /9 times n. We will use the 
standard identities co^{u) = (1 + cos(2m))/2 and cos(m) cos(t;) = , n follows 

that = 1 + 75^2^(3;) and = ^^-''('^>y^+''(^) . Now E(/3j-) = And 

nVar(/3j-) = Var(^j-(X)) = I i)][x)v{x)dx - P]. 



Now / iljj{x)p{x)dx = J iljj{x){l + '^'^^ f3eilJi{x))dx = 1 + Xl^^i A / 'ipfXx)ipj{x)dx = 1 + 
i E^^ 1 / Mx) (1 + ^) dx=l + ^. Thus, E,-, = 1 + ^-/32. Now consider j ^ k. 
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Then 



E(^,(X)^fc(X)) 



ipj{x)ipk{x)p{x)dx 



= /3j / 'ijj]{x)ilJk{x)dx + f3k / ^ / i^j{x)i^k{x)i^t{x)d: 

= y" 'il)2j{x)'il)k{x)dx + y i>2k{x)'il^j{x)dx 
+ 4^ V I {ipj^kix) + ipj+k{x))Mx) 

+ = \j -k\kj^ 2k) + = ^- + ^) 

V2 V2' 

where we used the fact that ip-j{x) = ijj{x) for all j = 1, 2, . . . and J ipj{x)dx — for all j > 0. 
So, we have for all j e {1, . . . , p}. 



k=l 



< 1 + 



^2, 



< 1 + 

= 0(1). 



x/2 



V2 V2 

k 

oo 

+ m+V2)j2m 



V2 



A;=l 



V2 



Hence, limsupj._^;^Ainax(5^fe) < ll^felloo ~ 0(1) and the lemma holds. □ 
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9.12 Proof of Theorem 



6.3 



The proof is similar to the proof of Theorem I4.4[ so we provide a short outline. In particular, 
the effect of truncation can be shown to be negligible as in the proof of Theorem 14.41 We have 
p — Pz = P — (! + Q — Pz = p — q + Op{m/k) and the latter term is negligible for k > n. Now 
p — q = p — p + p — q. The term p — p is the usual error term and contributes 0{n~'^'''/^'^'^'^^^) to 
the risk. For the second term, J{p- qf = YJjLi ^] = Op{m/n) = Op{n-'^^'/^'^''+^^). □ 
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